Overview

Dataset Statistics

Number of Variables 37
Number of Rows 1.6702e+06
Missing Cells 1.1109e+07
Missing Cells (%) 18.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 1.9 GB
Average Row Size in Memory 1.2 KB
Variable Types
  • Numerical: 19
  • Categorical: 18

Dataset Insights

SK_ID_PREV is uniformly distributed Uniform
RATE_INTEREST_PRIMARY and RATE_INTEREST_PRIVILEGED have similar distributions Similar Distribution
DAYS_LAST_DUE and DAYS_TERMINATION have similar distributions Similar Distribution
AMT_ANNUITY has 372235 (22.29%) missing values Missing
AMT_DOWN_PAYMENT has 895844 (53.64%) missing values Missing
AMT_GOODS_PRICE has 385515 (23.08%) missing values Missing
RATE_DOWN_PAYMENT has 895844 (53.64%) missing values Missing
RATE_INTEREST_PRIMARY has 1664263 (99.64%) missing values Missing
RATE_INTEREST_PRIVILEGED has 1664263 (99.64%) missing values Missing
NAME_TYPE_SUITE has 820405 (49.12%) missing values Missing
CNT_PAYMENT has 372230 (22.29%) missing values Missing
DAYS_FIRST_DRAWING has 673065 (40.3%) missing values Missing
DAYS_FIRST_DUE has 673065 (40.3%) missing values Missing
DAYS_LAST_DUE_1ST_VERSION has 673065 (40.3%) missing values Missing
DAYS_LAST_DUE has 673065 (40.3%) missing values Missing
DAYS_TERMINATION has 673065 (40.3%) missing values Missing
NFLAG_INSURED_ON_APPROVAL has 673065 (40.3%) missing values Missing
AMT_ANNUITY is skewed Skewed
AMT_APPLICATION is skewed Skewed
AMT_CREDIT is skewed Skewed
AMT_DOWN_PAYMENT is skewed Skewed
AMT_GOODS_PRICE is skewed Skewed
RATE_DOWN_PAYMENT is skewed Skewed
RATE_INTEREST_PRIMARY is skewed Skewed
RATE_INTEREST_PRIVILEGED is skewed Skewed
SELLERPLACE_AREA is skewed Skewed
CNT_PAYMENT is skewed Skewed
DAYS_FIRST_DRAWING is skewed Skewed
DAYS_FIRST_DUE is skewed Skewed
DAYS_LAST_DUE_1ST_VERSION is skewed Skewed
DAYS_LAST_DUE is skewed Skewed
DAYS_TERMINATION is skewed Skewed
FLAG_LAST_APPL_PER_CONTRACT has constant length 1 Constant Length
NFLAG_LAST_APPL_IN_DAY has constant length 1 Constant Length
NFLAG_INSURED_ON_APPROVAL has constant length 3 Constant Length
DAYS_DECISION has 1670214 (100.0%) negatives Negatives
SELLERPLACE_AREA has 762675 (45.66%) negatives Negatives
DAYS_FIRST_DRAWING has 62705 (3.75%) negatives Negatives
DAYS_FIRST_DUE has 956504 (57.27%) negatives Negatives
DAYS_LAST_DUE_1ST_VERSION has 678188 (40.6%) negatives Negatives
DAYS_LAST_DUE has 785928 (47.06%) negatives Negatives
DAYS_TERMINATION has 771236 (46.18%) negatives Negatives
AMT_APPLICATION has 392402 (23.49%) zeros Zeros
AMT_CREDIT has 336768 (20.16%) zeros Zeros
AMT_DOWN_PAYMENT has 369854 (22.14%) zeros Zeros
RATE_DOWN_PAYMENT has 369854 (22.14%) zeros Zeros
CNT_PAYMENT has 144985 (8.68%) zeros Zeros
  • 1
  • 2
  • 3
  • 4
  • 5

Variables


SK_ID_PREV

numerical

Approximate Distinct Count 1670214
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean 1.9231e+06
Minimum 1000001
Maximum 2845382
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_PREV is uniformly distributed
  • SK_ID_PREV is skewed left (γ1 = -0.0006)

Quantile Statistics

Minimum 1000001
5-th Percentile 1.0956e+06
Q1 1.4715e+06
Median 1.9303e+06
Q3 2.3894e+06
95-th Percentile 2.755e+06
Maximum 2845382
Range 1845381
IQR 917893.5

Descriptive Statistics

Mean 1.9231e+06
Standard Deviation 532597.9587
Variance 2.8366e+11
Sum 3.212e+12
Skewness -0.00057313
Kurtosis -1.1998
Coefficient of Variation 0.2769

SK_ID_CURR

numerical

Approximate Distinct Count 338857
Approximate Unique (%) 20.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean 278357.1741
Minimum 100001
Maximum 456255
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_CURR is skewed left (γ1 = -0.0033)

Quantile Statistics

Minimum 100001
5-th Percentile 118394.7
Q1 189897.25
Median 279474.5
Q3 368611
95-th Percentile 438985.3
Maximum 456255
Range 356254
IQR 178713.75

Descriptive Statistics

Mean 278357.1741
Standard Deviation 102814.8238
Variance 1.0571e+10
Sum 4.6492e+11
Skewness -0.003303
Kurtosis -1.1993
Coefficient of Variation 0.3694
  • SK_ID_CURR is not normally distributed (p-value 3.977355703426066e-07)

NAME_CONTRACT_TYPE

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 129146052

Length

Mean 12.3231
Standard Deviation 2.1189
Median 14
Minimum 3
Maximum 15

Sample

1st row Consumer loans
2nd row Cash loans
3rd row Cash loans
4th row Cash loans
5th row Cash loans

Letter

Count 18912274
Lowercase Letter 17241368
Space Separator 1669868
Uppercase Letter 1670906
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Cash loans, Consumer loans) take over 50.0%
  • The largest value (loans) is over 2.23 times larger than the second largest value (cash)

AMT_ANNUITY

numerical

Approximate Distinct Count 357959
Approximate Unique (%) 27.6%
Missing 372235
Missing (%) 22.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 20767664
Mean 15955.1207
Minimum 0
Maximum 418058.145
Zeros 1637
Zeros (%) 0.1%
Negatives 0
Negatives (%) 0.0%
  • AMT_ANNUITY is skewed right (γ1 = 2.6926)

Quantile Statistics

Minimum 0
5-th Percentile 2810.349
Q1 6396.4125
Median 11250
Q3 20988.3713
95-th Percentile 46238.0872
Maximum 418058.145
Range 418058.145
IQR 14591.9588

Descriptive Statistics

Mean 15955.1207
Standard Deviation 14782.1373
Variance 2.1851e+08
Sum 2.0709e+10
Skewness 2.6926
Kurtosis 15.0698
Coefficient of Variation 0.9265
  • AMT_ANNUITY is not normally distributed (p-value 1.111895649168766e-16)
  • AMT_ANNUITY has 78843 outliers

AMT_APPLICATION

numerical

Approximate Distinct Count 93885
Approximate Unique (%) 5.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean 175233.8604
Minimum 0
Maximum 6.9052e+06
Zeros 392402
Zeros (%) 23.5%
Negatives 0
Negatives (%) 0.0%
  • AMT_APPLICATION is skewed right (γ1 = 3.3914)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 19966.5
Median 71995.5
Q3 186144.3
95-th Percentile 855000
Maximum 6.9052e+06
Range 6.9052e+06
IQR 166177.8

Descriptive Statistics

Mean 175233.8604
Standard Deviation 292779.7624
Variance 8.572e+10
Sum 2.9268e+11
Skewness 3.3914
Kurtosis 15.7622
Coefficient of Variation 1.6708
  • AMT_APPLICATION is not normally distributed (p-value 4.9425108078866124e-24)
  • AMT_APPLICATION has 205480 outliers

AMT_CREDIT

numerical

Approximate Distinct Count 86803
Approximate Unique (%) 5.2%
Missing 1
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723408
Mean 196114.0212
Minimum 0
Maximum 6.9052e+06
Zeros 336768
Zeros (%) 20.2%
Negatives 0
Negatives (%) 0.0%
  • AMT_CREDIT is skewed right (γ1 = 3.2458)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 25060.5
Median 81796.5
Q3 225000
95-th Percentile 900000
Maximum 6.9052e+06
Range 6.9052e+06
IQR 199939.5

Descriptive Statistics

Mean 196114.0212
Standard Deviation 318574.6165
Variance 1.0149e+11
Sum 3.2755e+11
Skewness 3.2458
Kurtosis 14.2387
Coefficient of Variation 1.6244
  • AMT_CREDIT is not normally distributed (p-value 1.1255472416434752e-23)
  • AMT_CREDIT has 170125 outliers

AMT_DOWN_PAYMENT

numerical

Approximate Distinct Count 29278
Approximate Unique (%) 3.8%
Missing 895844
Missing (%) 53.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 12389920
Mean 6697.4021
Minimum -0.9
Maximum 3.06e+06
Zeros 369854
Zeros (%) 22.1%
Negatives 2
Negatives (%) 0.0%
  • AMT_DOWN_PAYMENT is skewed right (γ1 = 36.4765)

Quantile Statistics

Minimum -0.9
5-th Percentile 0
Q1 0
Median 1777.5
Q3 7920
95-th Percentile 27000
Maximum 3.06e+06
Range 3.06e+06
IQR 7920

Descriptive Statistics

Mean 6697.4021
Standard Deviation 20921.4954
Variance 4.3771e+08
Sum 5.1863e+09
Skewness 36.4765
Kurtosis 2901.8262
Coefficient of Variation 3.1238
  • AMT_DOWN_PAYMENT is not normally distributed (p-value 4.243891970370921e-25)
  • AMT_DOWN_PAYMENT has 62863 outliers

AMT_GOODS_PRICE

numerical

Approximate Distinct Count 93885
Approximate Unique (%) 7.3%
Missing 385515
Missing (%) 23.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 20555184
Mean 227847.2793
Minimum 0
Maximum 6.9052e+06
Zeros 6869
Zeros (%) 0.4%
Negatives 0
Negatives (%) 0.0%
  • AMT_GOODS_PRICE is skewed right (γ1 = 3.0737)

Quantile Statistics

Minimum 0
5-th Percentile 23301
Q1 51840
Median 112500
Q3 245157.75
95-th Percentile 904500
Maximum 6.9052e+06
Range 6.9052e+06
IQR 193317.75

Descriptive Statistics

Mean 227847.2793
Standard Deviation 315396.5579
Variance 9.9475e+10
Sum 2.9272e+11
Skewness 3.0737
Kurtosis 12.8663
Coefficient of Variation 1.3842
  • AMT_GOODS_PRICE is not normally distributed (p-value 6.885640052837572e-23)
  • AMT_GOODS_PRICE has 143463 outliers

WEEKDAY_APPR_PROCESS_START

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 120584802

Length

Mean 7.1972
Standard Deviation 1.1253
Median 7
Minimum 6
Maximum 9

Sample

1st row SATURDAY
2nd row THURSDAY
3rd row TUESDAY
4th row MONDAY
5th row THURSDAY

Letter

Count 12020892
Lowercase Letter 0
Space Separator 0
Uppercase Letter 12020892
Dash Punctuation 0
Decimal Number 0

HOUR_APPR_PROCESS_START

numerical

Approximate Distinct Count 24
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean 12.4842
Minimum 0
Maximum 23
Zeros 109
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • HOUR_APPR_PROCESS_START is skewed left (γ1 = -0.0256)

Quantile Statistics

Minimum 0
5-th Percentile 7
Q1 10
Median 12
Q3 15
95-th Percentile 18
Maximum 23
Range 23
IQR 5

Descriptive Statistics

Mean 12.4842
Standard Deviation 3.334
Variance 11.1157
Sum 2.0851e+07
Skewness -0.02563
Kurtosis -0.2778
Coefficient of Variation 0.2671
  • HOUR_APPR_PROCESS_START is not normally distributed (p-value 4.9904135558741846e-05)
  • HOUR_APPR_PROCESS_START has 1639 outliers

FLAG_LAST_APPL_PER_CONTRACT

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 110234124
  • The largest value (Y) is over 196.08 times larger than the second largest value (N)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row Y
2nd row Y
3rd row Y
4th row Y
5th row Y

Letter

Count 1670214
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1670214
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Y, N) take over 50.0%
  • The largest value (y) is over 196.08 times larger than the second largest value (n)
  • FLAG_LAST_APPL_PER_CONTRACT has words of constant length

NFLAG_LAST_APPL_IN_DAY

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 110234124
  • The largest value (1) is over 282.09 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1670214
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 282.09 times larger than the second largest value (0)
  • NFLAG_LAST_APPL_IN_DAY has words of constant length

RATE_DOWN_PAYMENT

numerical

Approximate Distinct Count 207033
Approximate Unique (%) 26.7%
Missing 895844
Missing (%) 53.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 12389920
Mean 0.07964
Minimum -1.4979e-05
Maximum 1
Zeros 369854
Zeros (%) 22.1%
Negatives 2
Negatives (%) 0.0%
  • RATE_DOWN_PAYMENT is skewed right (γ1 = 2.1077)

Quantile Statistics

Minimum -1.4979e-05
5-th Percentile 0
Q1 0
Median 0.06182
Q3 0.1089
95-th Percentile 0.3028
Maximum 1
Range 1
IQR 0.1089

Descriptive Statistics

Mean 0.07964
Standard Deviation 0.1078
Variance 0.01163
Sum 61668.3608
Skewness 2.1077
Kurtosis 6.2044
Coefficient of Variation 1.3539
  • RATE_DOWN_PAYMENT is not normally distributed (p-value 1.0171251654488927e-20)
  • RATE_DOWN_PAYMENT has 43138 outliers

RATE_INTEREST_PRIMARY

numerical

Approximate Distinct Count 148
Approximate Unique (%) 2.5%
Missing 1664263
Missing (%) 99.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 95216
Mean 0.1884
Minimum 0.03478
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • RATE_INTEREST_PRIMARY is skewed right (γ1 = 5.1969)

Quantile Statistics

Minimum 0.03478
5-th Percentile 0.1424
Q1 0.1607
Median 0.1891
Q3 0.1933
95-th Percentile 0.1969
Maximum 1
Range 0.9652
IQR 0.03261

Descriptive Statistics

Mean 0.1884
Standard Deviation 0.08767
Variance 0.007686
Sum 1120.9118
Skewness 5.1969
Kurtosis 28.1798
Coefficient of Variation 0.4655
  • RATE_INTEREST_PRIMARY is not normally distributed (p-value 1.3650756197979379e-14)
  • RATE_INTEREST_PRIMARY has 234 outliers

RATE_INTEREST_PRIVILEGED

numerical

Approximate Distinct Count 25
Approximate Unique (%) 0.4%
Missing 1664263
Missing (%) 99.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 95216
Mean 0.7735
Minimum 0.3732
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • RATE_INTEREST_PRIVILEGED is skewed left (γ1 = -1.0074)

Quantile Statistics

Minimum 0.3732
5-th Percentile 0.6379
Q1 0.7156
Median 0.8351
Q3 0.8525
95-th Percentile 0.8673
Maximum 1
Range 0.6268
IQR 0.1369

Descriptive Statistics

Mean 0.7735
Standard Deviation 0.1009
Variance 0.01018
Sum 4603.1136
Skewness -1.0074
Kurtosis 0.2544
Coefficient of Variation 0.1304
  • RATE_INTEREST_PRIVILEGED is not normally distributed (p-value 6.92067944169636e-13)
  • RATE_INTEREST_PRIVILEGED has 72 outliers

NAME_CASH_LOAN_PURPOSE

categorical

Approximate Distinct Count 25
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 114095997

Length

Mean 3.3122
Standard Deviation 2.0334
Median 3
Minimum 3
Maximum 32

Sample

1st row XAP
2nd row XNA
3rd row XNA
4th row XNA
5th row Repairs

Letter

Count 5473563
Lowercase Letter 602191
Space Separator 55767
Uppercase Letter 4871372
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (XAP, XNA) take over 50.0%

NAME_CONTRACT_STATUS

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 121740688
  • The largest value (Approved) is over 3.28 times larger than the second largest value (Canceled)

Length

Mean 7.8893
Standard Deviation 0.6442
Median 8
Minimum 7
Maximum 12

Sample

1st row Approved
2nd row Approved
3rd row Approved
4th row Approved
5th row Refused

Letter

Count 13150342
Lowercase Letter 11480128
Space Separator 26436
Uppercase Letter 1670214
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Approved, Canceled) take over 50.0%
  • The largest value (approved) is over 3.28 times larger than the second largest value (canceled)

DAYS_DECISION

numerical

Approximate Distinct Count 2922
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean -880.6797
Minimum -2922
Maximum -1
Zeros 0
Zeros (%) 0.0%
Negatives 1670214
Negatives (%) 100.0%
  • DAYS_DECISION is skewed left (γ1 = -1.0531)

Quantile Statistics

Minimum -2922
5-th Percentile -2545
Q1 -1276
Median -576
Q3 -278
95-th Percentile -79
Maximum -1
Range 2921
IQR 998

Descriptive Statistics

Mean -880.6797
Standard Deviation 779.0997
Variance 606996.2907
Sum -1.4709e+09
Skewness -1.0531
Kurtosis -0.03785
Coefficient of Variation -0.8847
  • DAYS_DECISION is not normally distributed (p-value 3.059250538290988e-05)
  • DAYS_DECISION has 29519 outliers

NAME_PAYMENT_TYPE

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 132408157
  • The largest value (Cash through the bank) is over 1.65 times larger than the second largest value (XNA)

Length

Mean 14.2762
Standard Deviation 8.768
Median 21
Minimum 3
Maximum 41

Sample

1st row Cash through the b...
2nd row XNA
3rd row Cash through the b...
4th row Cash through the b...
5th row Cash through the b...

Letter

Count 20704309
Lowercase Letter 17779327
Space Separator 3131745
Uppercase Letter 2924982
Dash Punctuation 8193
Decimal Number 0
  • The top 2 categories (Cash through the bank, XNA) take over 50.0%

CODE_REJECT_REASON

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 113624832
  • The largest value (XAP) is over 7.72 times larger than the second largest value (HC)

Length

Mean 3.0301
Standard Deviation 0.6502
Median 3
Minimum 2
Maximum 6

Sample

1st row XAP
2nd row XAP
3rd row XAP
4th row XAP
5th row HC

Letter

Count 5060922
Lowercase Letter 0
Space Separator 0
Uppercase Letter 5060922
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (XAP, HC) take over 50.0%
  • The largest value (xap) is over 7.72 times larger than the second largest value (hc)

NAME_TYPE_SUITE

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 820405
Missing (%) 49.1%
Memory Size 64612843
  • The largest value (Unaccompanied) is over 2.39 times larger than the second largest value (Family)

Length

Mean 11.0322
Standard Deviation 3.2879
Median 13
Minimum 6
Maximum 15

Sample

1st row Unaccompanied
2nd row Spouse, partner
3rd row Family
4th row Unaccompanied
5th row Unaccompanied

Letter

Count 9209939
Lowercase Letter 8333429
Space Separator 71549
Uppercase Letter 876510
Dash Punctuation 0
Decimal Number 0
  • The largest value (unaccompanied) is over 2.39 times larger than the second largest value (family)

NAME_CLIENT_TYPE

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 120544751
  • The largest value (Repeater) is over 4.09 times larger than the second largest value (New)

Length

Mean 7.1732
Standard Deviation 1.9843
Median 8
Minimum 3
Maximum 9

Sample

1st row Repeater
2nd row Repeater
3rd row Repeater
4th row Repeater
5th row Repeater

Letter

Count 11980841
Lowercase Letter 10306745
Space Separator 0
Uppercase Letter 1674096
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Repeater, New) take over 50.0%
  • The largest value (repeater) is over 4.09 times larger than the second largest value (new)

NAME_GOODS_CATEGORY

categorical

Approximate Distinct Count 28
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 119898530
  • The largest value (XNA) is over 4.23 times larger than the second largest value (Mobile)

Length

Mean 6.7863
Standard Deviation 5.9592
Median 3
Minimum 3
Maximum 24

Sample

1st row Mobile
2nd row XNA
3rd row XNA
4th row XNA
5th row XNA

Letter

Count 10921322
Lowercase Letter 7012769
Space Separator 288836
Uppercase Letter 3908553
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (XNA, Mobile) take over 50.0%
  • The largest value (xna) is over 4.23 times larger than the second largest value (mobile)

NAME_PORTFOLIO

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 114326510

Length

Mean 3.4502
Standard Deviation 0.6489
Median 3
Minimum 3
Maximum 5

Sample

1st row POS
2nd row Cash
3rd row Cash
4th row Cash
5th row Cash

Letter

Count 5762600
Lowercase Letter 1965904
Space Separator 0
Uppercase Letter 3796696
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (POS, Cash) take over 50.0%

NAME_PRODUCT_TYPE

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 115544457
  • The largest value (XNA) is over 2.33 times larger than the second largest value (x-sell)

Length

Mean 4.1794
Standard Deviation 1.5834
Median 3
Minimum 3
Maximum 7

Sample

1st row XNA
2nd row x-sell
3rd row x-sell
4th row x-sell
5th row walk-in

Letter

Count 6373999
Lowercase Letter 3183001
Space Separator 0
Uppercase Letter 3190998
Dash Punctuation 606548
Decimal Number 0
  • The top 2 categories (XNA, x-sell) take over 50.0%
  • The largest value (xna) is over 2.33 times larger than the second largest value (xsell)

CHANNEL_TYPE

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 135874585

Length

Mean 16.3516
Standard Deviation 6.4564
Median 16
Minimum 5
Maximum 26

Sample

1st row Country-wide
2nd row Contact center
3rd row Credit and cash of...
4th row Credit and cash of...
5th row Credit and cash of...

Letter

Count 23955068
Lowercase Letter 22062234
Space Separator 2581251
Uppercase Letter 1892834
Dash Punctuation 494690
Decimal Number 0
  • The top 2 categories (Credit and cash offices, Country-wide) take over 50.0%

SELLERPLACE_AREA

numerical

Approximate Distinct Count 2097
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 26723424
Mean 313.9511
Minimum -1
Maximum 4000000
Zeros 60523
Zeros (%) 3.6%
Negatives 762675
Negatives (%) 45.7%
  • SELLERPLACE_AREA is skewed right (γ1 = 529.6198)

Quantile Statistics

Minimum -1
5-th Percentile -1
Q1 -1
Median 4
Q3 90
95-th Percentile 2000
Maximum 4000000
Range 4000001
IQR 91

Descriptive Statistics

Mean 313.9511
Standard Deviation 7127.4435
Variance 5.08e+07
Sum 5.2437e+08
Skewness 529.6198
Kurtosis 296879.7468
Coefficient of Variation 22.7024
  • SELLERPLACE_AREA is not normally distributed (p-value 4.22651408368615e-25)
  • SELLERPLACE_AREA has 263344 outliers

NAME_SELLER_INDUSTRY

categorical

Approximate Distinct Count 11
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 123743859
  • The largest value (XNA) is over 2.15 times larger than the second largest value (Consumer electronics)

Length

Mean 9.0886
Standard Deviation 7.006
Median 8
Minimum 3
Maximum 20

Sample

1st row Connectivity
2nd row XNA
3rd row XNA
4th row XNA
5th row XNA

Letter

Count 14775479
Lowercase Letter 11391395
Space Separator 404470
Uppercase Letter 3384084
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (XNA, Consumer electronics) take over 50.0%
  • The largest value (xna) is over 2.15 times larger than the second largest value (consumer)

CNT_PAYMENT

numerical

Approximate Distinct Count 49
Approximate Unique (%) 0.0%
Missing 372230
Missing (%) 22.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 20767744
Mean 16.0541
Minimum 0
Maximum 84
Zeros 144985
Zeros (%) 8.7%
Negatives 0
Negatives (%) 0.0%
  • CNT_PAYMENT is skewed right (γ1 = 1.5314)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 6
Median 12
Q3 24
95-th Percentile 48
Maximum 84
Range 84
IQR 18

Descriptive Statistics

Mean 16.0541
Standard Deviation 14.5673
Variance 212.2059
Sum 2.0838e+07
Skewness 1.5314
Kurtosis 1.868
Coefficient of Variation 0.9074
  • CNT_PAYMENT is not normally distributed (p-value 4.999527231908628e-13)
  • CNT_PAYMENT has 55903 outliers

NAME_YIELD_GROUP

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 117983431

Length

Mean 5.6397
Standard Deviation 2.7333
Median 4
Minimum 3
Maximum 10

Sample

1st row middle
2nd row low_action
3rd row high
4th row middle
5th row high

Letter

Count 9005385
Lowercase Letter 7453740
Space Separator 0
Uppercase Letter 1551645
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (XNA, middle) take over 50.0%

PRODUCT_COMBINATION

categorical

Approximate Distinct Count 17
Approximate Unique (%) 0.0%
Missing 346
Missing (%) 0.0%
Memory Size 138954401

Length

Mean 18.2128
Standard Deviation 8.4147
Median 19
Minimum 4
Maximum 30

Sample

1st row POS mobile with in...
2nd row Cash X-Sell: low
3rd row Cash X-Sell: high
4th row Cash X-Sell: middl...
5th row Cash Street: high

Letter

Count 26233661
Lowercase Letter 22036750
Space Separator 3303743
Uppercase Letter 4196911
Dash Punctuation 414014
Decimal Number 0

DAYS_FIRST_DRAWING

numerical

Approximate Distinct Count 2838
Approximate Unique (%) 0.3%
Missing 673065
Missing (%) 40.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 15954384
Mean 342209.855
Minimum -2922
Maximum 365243
Zeros 0
Zeros (%) 0.0%
Negatives 62705
Negatives (%) 3.8%
  • DAYS_FIRST_DRAWING is skewed left (γ1 = -3.6013)

Quantile Statistics

Minimum -2922
5-th Percentile -227
Q1 365243
Median 365243
Q3 365243
95-th Percentile 365243
Maximum 365243
Range 368165
IQR 0

Descriptive Statistics

Mean 342209.855
Standard Deviation 88916.1158
Variance 7.9061e+09
Sum 3.4123e+11
Skewness -3.6013
Kurtosis 10.9697
Coefficient of Variation 0.2598
  • DAYS_FIRST_DRAWING is not normally distributed (p-value 5.377379162091425e-25)
  • DAYS_FIRST_DRAWING has 62705 outliers

DAYS_FIRST_DUE

numerical

Approximate Distinct Count 2892
Approximate Unique (%) 0.3%
Missing 673065
Missing (%) 40.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 15954384
Mean 13826.2693
Minimum -2892
Maximum 365243
Zeros 0
Zeros (%) 0.0%
Negatives 956504
Negatives (%) 57.3%
  • DAYS_FIRST_DUE is skewed right (γ1 = 4.6441)

Quantile Statistics

Minimum -2892
5-th Percentile -2600
Q1 -1603
Median -823
Q3 -406
95-th Percentile -40
Maximum 365243
Range 368135
IQR 1197

Descriptive Statistics

Mean 13826.2693
Standard Deviation 72444.8697
Variance 5.2483e+09
Sum 1.3787e+10
Skewness 4.6441
Kurtosis 19.5705
Coefficient of Variation 5.2397
  • DAYS_FIRST_DUE is not normally distributed (p-value 4.660972401959684e-25)
  • DAYS_FIRST_DUE has 40645 outliers

DAYS_LAST_DUE_1ST_VERSION

numerical

Approximate Distinct Count 4605
Approximate Unique (%) 0.5%
Missing 673065
Missing (%) 40.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 15954384
Mean 33767.7741
Minimum -2801
Maximum 365243
Zeros 705
Zeros (%) 0.0%
Negatives 678188
Negatives (%) 40.6%
  • DAYS_LAST_DUE_1ST_VERSION is skewed right (γ1 = 2.7794)

Quantile Statistics

Minimum -2801
5-th Percentile -2314
Q1 -1216
Median -351
Q3 139
95-th Percentile 365243
Maximum 365243
Range 368044
IQR 1355

Descriptive Statistics

Mean 33767.7741
Standard Deviation 106857.0348
Variance 1.1418e+10
Sum 3.3672e+10
Skewness 2.7794
Kurtosis 5.7261
Coefficient of Variation 3.1645
  • DAYS_LAST_DUE_1ST_VERSION is not normally distributed (p-value 7.436764693466506e-25)
  • DAYS_LAST_DUE_1ST_VERSION has 93865 outliers

DAYS_LAST_DUE

numerical

Approximate Distinct Count 2873
Approximate Unique (%) 0.3%
Missing 673065
Missing (%) 40.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 15954384
Mean 76582.4031
Minimum -2889
Maximum 365243
Zeros 0
Zeros (%) 0.0%
Negatives 785928
Negatives (%) 47.1%
  • DAYS_LAST_DUE is skewed right (γ1 = 1.4105)

Quantile Statistics

Minimum -2889
5-th Percentile -2337
Q1 -1289.4
Median -530
Q3 -68
95-th Percentile 365243
Maximum 365243
Range 368132
IQR 1221.4

Descriptive Statistics

Mean 76582.4031
Standard Deviation 149647.4151
Variance 2.2394e+10
Sum 7.6364e+10
Skewness 1.4105
Kurtosis -0.01045
Coefficient of Variation 1.9541
  • DAYS_LAST_DUE is not normally distributed (p-value 1.1702193975036799e-23)
  • DAYS_LAST_DUE has 211221 outliers

DAYS_TERMINATION

numerical

Approximate Distinct Count 2830
Approximate Unique (%) 0.3%
Missing 673065
Missing (%) 40.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 15954384
Mean 81992.3438
Minimum -2874
Maximum 365243
Zeros 0
Zeros (%) 0.0%
Negatives 771236
Negatives (%) 46.2%
  • DAYS_TERMINATION is skewed right (γ1 = 1.3064)

Quantile Statistics

Minimum -2874
5-th Percentile -2318.7
Q1 -1248
Median -491
Q3 -36
95-th Percentile 365243
Maximum 365243
Range 368117
IQR 1212

Descriptive Statistics

Mean 81992.3438
Standard Deviation 153303.5167
Variance 2.3502e+10
Sum 8.1759e+10
Skewness 1.3064
Kurtosis -0.2933
Coefficient of Variation 1.8697
  • DAYS_TERMINATION is not normally distributed (p-value 1.994207602314736e-23)
  • DAYS_TERMINATION has 225913 outliers

NFLAG_INSURED_ON_APPROVAL

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 673065
Missing (%) 40.3%
Memory Size 67806132
  • The largest value (0.0) is over 2.01 times larger than the second largest value (1.0)

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 0.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1994298
  • The top 2 categories (0.0, 1.0) take over 50.0%
  • The largest value (00) is over 2.01 times larger than the second largest value (10)
  • NFLAG_INSURED_ON_APPROVAL has words of constant length

Interactions

Correlations

Missing Values